Bounding the Bias of Contrastive Divergence Learning
نویسندگان
چکیده
Optimization based on k-step contrastive divergence (CD) has become a common way to train restricted Boltzmann machines (RBMs). The k-step CD is a biased estimator of the log-likelihood gradient relying on Gibbs sampling. We derive a new upper bound for this bias. Its magnitude depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a single variable. The last reflects the dependence on the absolute values of the RBM parameters. The magnitude of the bias is also affected by the distance in variation between the modeled distribution and the starting distribution of the Gibbs chain.
منابع مشابه
Average Contrastive Divergence for Training Restricted Boltzmann Machines
This paper studies contrastive divergence (CD) learning algorithm and proposes a new algorithm for training restricted Boltzmann machines (RBMs). We derive that CD is a biased estimator of the log-likelihood gradient method and make an analysis of the bias. Meanwhile, we propose a new learning algorithm called average contrastive divergence (ACD) for training RBMs. It is an improved CD algorith...
متن کاملPopulation-Contrastive-Divergence: Does consistency help with RBM training?
Estimating the log-likelihood gradient with respect to the parameters of a Restricted Boltzmann Machine (RBM) typically requires sampling using Markov Chain Monte Carlo (MCMC) techniques. To save computation time, the Markov chains are only run for a small number of steps, which leads to a biased estimate. This bias can cause RBM training algorithms such as Contrastive Divergence (CD) learning ...
متن کاملTraining RBMs based on the signs of the CD approximation of the log-likelihood derivatives
Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. T...
متن کاملLearning Multi-grid Generative ConvNets by Minimal Contrastive Divergence
This paper proposes a minimal contrastive divergence method for learning energy-based generative ConvNet models of images at multiple grids (or scales) simultaneously. For each grid, we learn an energy-based probabilistic model where the energy function is defined by a bottom-up convolutional neural network (ConvNet or CNN). Learning such a model requires generating synthesized examples from th...
متن کاملStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural computation
دوره 23 3 شماره
صفحات -
تاریخ انتشار 2011